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Abstract 

We study the nonparametric change point estimation for common changes in the 
means of panel data. The consistency of estimates is investigated when the number of 
panels tends to infinity but the sample size remains finite. Our focus is on weighted 
denoising estimates, involving the group fused LASSO, and on the weighted CUSUM 
estimates. Due to the fixed sample size, the common weighting schemes do not guar¬ 
antee consistency under (serial) dependence and most typical weightings do not even 
provide consistency in the i.i.d. setting when the noise is too dominant. 

Hence, on the one hand, we propose a consistent covariance-based extension of 
existing weighting schemes and discuss straightforward estimates of those weighting 
schemes. The performance will be demonstrated empirically in a simulation study. 
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consistency in the i.i.d. setting for classical weightings. 
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1 Introduction 


The aim of this paper is to study the estimation of changes in the context of panel data. 
We focus on common changes, i.e. changes that occur simultaneously in many panels (but 
not necessarily in all) at the same time points and we consider an asymptotic framework 
where the number d of panels tends to inhnity but the panel sample size n is hxed. 

The analysis of change point estimation in panel data is subject of intensive research 
(in particular in econometrics) and, as discussed in Bai (2010), dates back at least to the 
works of Joseph and Wolfson (1992, 1993). However, the setting d —>■ oo, which we are 
looking at, is generally not studied much in the literature concerning change point analysis 
and the settings n —)• oo or n, d —>• oo are far more established. 

For the classical setting of n —)• oo we refer to Csorgo and Horvath (1997). In 
the context of panel data especially the setting re, d —)• oo is quite popular (cf., e.g., 
Bai (2010), Horvath and Huskova (2012) and Kim (2014)). Nevertheless, the assumption 
d ^ oo and re fixed is also quite natural (cf., e.g., Bai (2010), Bleakley and Vert (2011a), 
Hadri et al (2012) and also Pestova and Pesta (2015)). It reflects the situation where the 
amount of panels, i.e. the dimensionality, is much larger than the sample size. 

Bai (2010) and Bleakley and Vert (2011a) mention important applications in hnance, 
biology and medicine where in particular the framework of common changes is appropriate: 
In finance such changes may occur simultaneously across many stocks e.g. due to a credit 
crisis or due to tax policy changes. In biology and medicine relevant applications are in the 
study of genomic prohles within classes of patients. As mentioned in Bleakley and Vert 
(2011a) the latter example fits particularly well in the re fixed and d —)• oo framework 
because the length of panels in genomic studies is fixed but the amount of panels can be 
increased by raising the number of patients. 

The body of literature related to change point estimation (and detection) is huge. 
Hence, we do not attempt to summarize it here and refer the reader instead to the reviews 
in Jandhyala et al (2013), Aue and Horvath (2013), Frick et al (2014) and Horvath and 
Rice (2014). Change point analysis in the d —)■ oo and re fixed setting goes at least 
back to the (aforementioned) papers by Bleakley and Vert (2010, 2011a) and by Bai 
(2010). Therein estimation of common changes is studied independently from different 
perspectives. However, as we will see, the setups of Bleakley and Vert (2010, 2011a) and 
of Bai (2010) are closely related^. 

Bai (2010) considered a least squares estimate for independent panels of linear time 
series under a single change point assumption and Bleakley and Vert (2011a) developed a 
weighted total variation denoising approach for the multiple change point scenario. Fur¬ 
thermore, Bleakley and Vert (2011a) proposed a computationally efficient algorithm and 
implemented it in a convenient MATLAB package GFLseg^ which we also used in some 
of our simulations. 

In this article we study consistency properties, in particular what we dehne as perfect 
estimation^, for the denoising estimate and for the weighted CUSUM (cumulative sums) 
estimate under weak dependence. Both types of estimates depend on certain weighting 

^Notice that Bleakley and Vert (2011a) is a revised version of Bleakley and Vert (2010). Hence, we will 
mostly refer to the more recent article. 

^Download is available at http: //cbio. ensmp.fr/GFLseg and is licensed under the GNU General Public 
License. 

®See Subsection 2.2.3 and (2.20) below. 
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schemes w. Two schemes, and were already considered by Bleakley 

and Vert (2011a) for the denoising approach in the n fixed and d —)• oo setting (cf. 
Subsection 2.2.2 for the precise definition). They showed that yi^tandard gjjgypgg perfect 
estimation and therefore has better consistency properties for d —?• oo than y;®™?'® does. 
(Notice that Bai (2010) showed perfect estimation for the least squares estimate, which 
corresponds to the weighted CUSUM estimate with yjStandard^^ 

We pick up the ideas of Bleakley and Vert (2011a) and extend them in various di¬ 
rections which will shed some new light on weighting schemes in general. First, we will 
emphasize the connection between the total variation denoising approach and the weighted 
CUSUM estimates. Notice that Bleakley and Vert (2011a) assumed independent panels 
of independent Gaussian observations. We continue by showing that their consistency 
results hold true under much weaker distributional assumptions, e.g. for panels of non- 
Gaussian time series with common factors. This is important since many datasets are 
neither Gaussian nor independent. An implication of our results is that y;®ta'ii<iard gener¬ 
ally does not provide consistency for panels of time series and therefore does not ensure 
perfect estimation under dependence. 

As a solution, we propose a modified weighting scheme y;®^®-®* ^ which is a generalization 
of yjStandard^ that takes the covariance structure within panels into account. We show that 
this is the only choice that may generally ensure perfect estimation and derive quite mild 
conditions under which y;®^®®* indeed ensures this property. In a detailed simulation 
study we confirm our results and demonstrate the gain in accuracy of y;®^®®*. Moreover, 
we show that our approach outperforms the classical schemes even in random change 
point settings and for rather moderate dimensions. In practice, the weights y;®^®®* have 
to be estimated. Therefore, we discuss feasible approaches and show their applicability in 
simulations. 

Gomplementary to the study of perfect estimation, we investigate consistent estimation 
for a further class of weights y;W®ighted^ which contains y;®'™?^® and y;®t®°®i®'"'i as special 
cases, and characterize changes which are (not) correctly estimated as d —)> oo. 

1.1 Basic setup 

We observe d panels {Yi^k}i=i,...,n for k = l,...,d in a signal plus noise model where 

Yi^k '^i,k U {^i,k U TfcCi) • (^A) 

Here, is an array of deterministic signals and {ei,fc}i,fceN is an array of 

random centered noises. The {CijieN are the so-called common/actors which are assumed 
to be random, centered and independent of Their effect on the fe-th panel is 

quantified via the deterministic factor loadings 7 ^ G M. 

We assume a (multiple) common change points scenario given by 



i = l,...,ui, 


. . . , 

i = Ul + 1, . . . ,U2, 

. . . , 

( 1 . 2 ) 

/^P+l,fc; 

i = Up + 1,... ,n, 



where we call ui,... ,up G N change points. The / = 1, • • ■ > F’ + 1, describe 

the piecewise constant signals in each panel, i.e. the means of the observations. In other 
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words the means jump simultaneously from levels mu^k to levels rriu+i^k in ah panels 
k = 1,... ,d at change points u G {ui,..., up}. However, we do not require rriu^k / 
m-ti+i.fc to hold for all k = 1,..., d, i.e. the changes do not have to occur in all panels. 
Later on we will impose more specific conditions on the average magnitude of changes. 

Subsequently, we assume that n > 3 since otherwise the model (1.2) is not reasonable 
because for n = 1 the model may not contain any change and for n = 2 it holds trivially 
that P = 1 with ui = 1. 

1.2 Notation 

We follow the compact matrix notation of Bleakley and Vert (2011a) and represent the 
model (1.2) as 

Y = M + E, 

with a deterministic matrix At of means with Ai-i^k = and a random matrix of 

errors E with Ei^k = £i,k + 7kCi- Now, let X be any n x d matrix. To shorten the 
notation we write for [Aij,..., Xnj]'^ and for [Aj_i,..., For example 

Y,^k represents the A:-th panel and a common change at u corresponds to 

A = At^+i,. - Al„,. ^ 0. (1.3) 

II • \\f denotes the Frobenius norm and || • ||2 stands for the Euclidean norm. We simply 
write II • II for the former when no confusion is possible, unless it is stated otherwise. 

We will consider functions f{i) with a discrete support i = 1,... ,n — 1 and say 
that a function / is convex (or concave) if this holds true for the linear interpolation of 
points f{i) on the interval [1, n. — 1]. Subsequently, we mean by argmax the whole set 
of points at which the maximum is attained. 

The paper is organized as follows. In Section 2 we discuss segmentation of panel data. 
In Subsection 2.1 we introduce the concept of the denoising segmentation approach in 
general and then we turn to the single change point scenario in Subsection 2.2. First, 
we clarify the selection of a certain regularization parameter and continue to discuss the 
relation to a class of weighted CUSUM estimates in Subsection 2.2.1. Common weighting 
schemes are presented in Subsection 2.2.2. In Subsection 2.2.3 we analyze the segmenta¬ 
tion procedures with respect to different weighting schemes and propose a generalization 
of existing approaches. Subsequently, we discuss estimates of the generalized weighting 
scheme in Subsection 2.2.4. In Section 3 we confirm our theoretical results in a simulation 
study. Finally, we provide a short summary of the paper in Section 4 and all proofs are 
postponed to Section 5. 

2 Segmentation of panel data 

We start with a description of the denoising approach to change point estimation of Bleak- 
ley and Vert (2011a). For an overview of the related literature we refer to the references 
therein. 
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2.1 Total variation denoising estimates 

The total variation denoising approach to segmentation is to solve the convex minimization 
problem^ 

Minimize -||T — U\\p + A x totvar(?7) (2-1) 

U^^nXd 2 

for an appropriate regularization parameter A > 0 under a weighted total variation 
penalty term 


n—1 

totvar(?7) = 

i=l 


n) 


( 2 . 2 ) 


with positive, position dependent weights w{i,n) > 0. We denote the solution of (2.1) 
by U (A) and each column represents the best piecewise constant ht to the panel 

with respect to (2.1). Each change in those fits, in the sense of Uu+i,» / Uu,,, is 
therefore assumed to identify a common change across panels at time point u. Hence, the 
set £ of estimated change points is given by 

T(A) = {u I 17„,.(A) + 17,+i,.(A)} . (2.3) 

The penalty term totvar(f7) is designed in such a way that 17(A) has for A > 0 a 
tendency to reduce the cardinality of (2.3), i.e. to reduce the amount of identified change 
points. Hence, £ has a tendency to become smaller as A increases. 

Two extreme cases give some insight: For X '[ oo the penalty term totvar(t/) 
dominates the minimization and forces the minimizer U to be constant across rows, i.e. 
we obtain T = 0 and no change points are identified by this procedure at all. In contrast 
to this, if A = 0 then 17(0) = Y and therefore T(0) = {u \ Tu^,(A) / yu+i^,(A)}, i.e. 
the number of estimated changes corresponds to the number of different consecutive rows 
of Y. Hence, if e.g. all rows are unique then each point i = 1,..., n — 1 is identihed as 
a change point. 


2.2 Single change point scenario 

Following Bleakley and Vert (2011a) we will at first restrict our considerations to the single 
change point scenario^. 

Assumption 2.1. We consider a single change point scenario with a change at some time 
point u G {1,... ,n — 1} where n > 3. 

We need to clarify the selection of A for (2.1). In the single change point setup we 
aim to select A as large as possible such that the set £ of change points contains only 
one change point® in which case £ = {11} and u denotes the denoising estimate for u. 

^The objective function in (2.1) is strictly convex, as a sum of convex functions and due to the strict 
convexity of the mapping U i—>■ ||y — 17|||’. Moreover, we may restrict the minimization to a compact 
subset. Therefore, a unique solution exists for any A > 0. 

^However, as will be shown in the simulations, our findings do have practical implications on the multiple 
change point scenario as well which is why we stated the general model in (1.2). 

®For the single change point scenario Bleakley and Vert (2011a, in their software GFLseg) perform a 
dichotomic search to find the “first” A such that £ contains only one element. 
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Figure 1: The set £(A), as defined in (2.3), along the A regularization path for n = 100, d = 50, 
independent panels with independent standard Gaussian noise {si,k} and without common factors, i.e. 
with 7 i, = 0; the true change point lies at u = 50 with means fii^k = 0 and = 1, i.e. At = 1, for 
all panels k = 1,... ,d. Notice that a change point at i = 25 is only estimated for A between 4 and 4.7. 
(The computations are performed using the MATLAB package GFLseg.) 


Heuristically, this forces u to be the most reasonable selection of exactly one common 
change point according to the penalty term totvar(C/). 

As will be discussed in Proposition 5.1, one can identify under mild assumptions a 
random interval such that any A G (Amin,Amax), with 0 < Amin < Amax, yields the same 
estimate u and such that any A G [Amax, oo) yields = 0, i.e. no estimate. Thus, in 
the following, we tacitly assume that we select any A G (Amin, Amax) in which case the 
corresponding estimate u is unambiguous. 

Notice that generally the number of estimated change points does not necessarily 
decrease monotonously in A for d > 1 (cf. Figure 1 and Section 4 of Bleakley and Vert 
(2011a)) and, additionally, it seems not clear whether any parameter A that identifies 
only one change point yields the same estimate. 

Remark 2.2. A parameter A that yields only one change point does not always exist. 
(E.g., it holds T = 0 for any A if all entries of Y are equal.) We will consider situations 
where such cases do not occur with probability tending to 1 as d —>■ oo. 

Remark 2.3. For selection of some reasonable A in the case of multiple changes, in 
particular if the number of changes is unknown in advance (which is a more realistic 
scenario), we refer to Bleakley and Vert (2011a) and to the references therein. 


2.2.1 Relation to weighted CUSUM 

In the next proposition we observe, based on Proposition 5.1, the relation of the estimate 
u from the denoising approach to a well-known weighted CUSUM estimate' 


u* G argmax t{i), 


t{i) = w‘^{i, n) 


EI Etm 

k=l j=l 


(2.4) 


is usually defined as the smallest element in argmax. Here, we allow n* to be any element in 

arg max. 
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Proposition 2.4. Under Assumption 2.1 and given that t{i) has a unique maximum, it 
holds that u = if we use the same weighting w for the denoising and the CUSUM 
estimates. 

Note that this connection holds true only in case of a single change point whereas 
otherwise denoising and CUSUM estimates differ. Therefore, recall that the denoising 
segmentation approach yields p distinct change point estimates in case of p change 
points. 

Proposition 2.4 allows us to study denoising and CUSUM estimates simultaneously if 
we tacitly exclude the case of non unique maxima of t{i) from our considerations®. This 
is not a problem, since we will focus mostly on situations where this case does not occur 
with probability tending to 1 as d —)• oo. 

2.2.2 Common weighting schemes 

Bleakley and Vert (2011a) already studied the weightings 

^simple(n) = 1, y;Standard^^ {{i/n){l - ijn))-^!'^ (2.5) 

for the denoising estimate u with respect to d —)• oo and the latter has also been studied 
by Bai (2010) for the least-squares estimate. Both schemes can be considered as natural 
and are reasonable for the denoising and for the CUSUM estimate as well. The former, 
rfSimpie^ appears to be the first choice from the point of view of the denoising approach. 
In fact, Bleakley and Vert (2010) started with this case and studied y;Standard jg 

Bleakley and Vert (2011a). On the other hand, the latter weighting, yj^t^ndard^ appears 
to be the natural choice from the CUSUM point of view, because it can be derived via 
a maximum-likelihood or a least-squares approach. Both weights are special cases of the 
following parametrized scheme 

^weighted^.^^) ^ 0<7<1/2. (2.6) 

These weights are quite popular in the field of change point analysis. Asymptotic prop¬ 
erties are well studied for n —)• oo for testing with weighted CUSUM statistics®, via 
in9,X2=_ 72 — 1 tV2(i), 

or for estimating changes via argmaxj^]^^ ,^_;^t(i) (cf., e.g., Csorgd 
and Horvath (1997)). A smaller 7 is usually expected to increase the sensitivity of testing 
or estimation procedures towards change points in the middle of time series. 

In the next subsection we will study estimates under the d —>■ 00 asymptotics with 
respect to the weights (2.5) and (2.6). In particular we will see limitations of (2.5) and 
propose a suitable extension that has better consistency properties. 

2.2.3 Theoretical analysis of weighting schemes 

For our analysis we have to impose some (homogeneous) structure on the noise {si^k} 
and on the common factors {Cj} in the next two assumptions. Therefore, let 
i 

5'i,fc(e) = n"^/^^(e 2 fc-en,fc) (2.7) 

1=1 

t{i) has a non unique maximum, then counterexamples may be constructed such that £(A) = {u} 
with u G argmax^^j „_i ^(*) is impossible. 

®When dealing with CUSUM (under n —>■ 00 asymptotics), the observations {V,»} s-'’® usually 

additionally rescaled by the long run covariance matrix. 
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be the cumulated centered noises in the k-th. panel. 

Assumption 2.5. 

1. The noise {Ei^k}i,keN is centered with finite fourth moments and the variances fulfill 
E{ei^k)‘^ = cr^ for some 0 < ci^ < oo and all i,k. 

2. The function 


V‘^{i) = Var (5i,fc(e)) /cr^, i = 
is independent of k. 

Assumption 2.6. The common factors {Ci} are independent of {ei,fc} and are centered 
with finite fourth moments. Moreover, it holds that, as d —)• oo, 

= ( 2 . 8 ) 

“ k=l 

Before proceeding further with the theory, we show some specific examples for the 
function V‘^{i) and also discuss some sufficient conditions for part 2 of Assumption 2.5. 
Clearly, given that part 1 of Assumption 2.5 holds true, a sufficient condition is identical 
distribution of the panels {e»,fc}fceN- The following examples will both play important 
roles in our subsequent analysis. 

Example 2.7 (Uncorrelated noise). Assume that part 1 of Assumption 2.5 holds true and 
that are pairwise uncorrelated for any k. In this situation it holds that 

V‘^{i) = {i/n){l — i/n). (2.9) 


Example 2.8 (Moving average noise). Another interesting case, which satishes Assumption 
2.5, is given by {£i,k}i,k£^ where 

£i,k = {r]i,k + (pili-i^k) + 0 ir]i,k-i + (j)r]i-i,k-i) ( 2 . 10 ) 

for i,k G N, i.e. {£i^k}i,keN are MA(1) in time and across panels. Here, we assume some 
common parameters (p,6 G M. and centered i.i.d. shocks {r?i,A:}i,A:eN with finite fourth 
moments and with E{r]‘jj^) = 5'^, 0 < < oo. In this case (2.9) extends to 


(i) = Ca{4>) {i/n){l — i/n) —2C4>jn 


( 2 . 11 ) 


with 


= l + + 2(j) + 2(j)/n, = a‘^ {1 + ^‘^ + 9^ + cj)‘^ 9^) (2.12) 

and with the constant C = + 0^) that is independent of i. 

The following deterministic critical functions are the cornerstone of our subsequent 
analysis: 


Cn{i]u,r) 


uP‘{i,n) V‘^{i)r + H'^{i,u) 


(2.13) 
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for 0 < r < oo with 


H{i, u) = 


(i/n)(l — w/n), i = 

{u/n){l — i/n), i = u,...,n — l, 


for u = 1,..., n — 1. As before, u = ui is the change point and w{i, n) are the weights. 
The parameter 


1 a 

r = \ - 


2 
n 


OO 


(2.14) 


for 0 < < oo and 0 < < oo, is the normalized noise to change ratio where 


A 


2 

OO 



(2.15) 


is the average change across all panels at the change point u. (Recall from (1.2) and (1.3) 
that Afc = gL 2 ^k — hi,k-) The case A^ = 0 of a vanishing change is excluded but will be 
addressed in the Remark 2.19 below. For simplicity we will write C{i] u, n, r) = Cn{i\ u, r). 


The next theorem generalizes^*^ Bleakley and Vert (2011a, Lemma 1). Following their 
approach we show that C{i;u,n,r) is in probability the limit of rescaled t{i) for all 
i = 1,..., n — 1 (cf. also Proposition 5.1). 

Theorem 2.9. Let Assumptions 2.1, 2.5 and 2.6 be fulfilled. Assume that it holds that, 
as d ^ oo. 


^2 ^ ^ I C'Ov(£j £;^q£p^g) I — o(l)) 

k,q=l 

1 ^ ^ 

fp 0!k,q\C0Y{Sj^k,ei^q)\ =o(l) 

k,q=l 


(2.16) 

(2.17) 


with ak,q = |AfcAq| + \'jk7q\ and for all j,h,l,p = l,...,n. Then, it holds that, as 
d —)• oo, 


P argmaxt(i) C 5 —)> 1, S = argmaxC(z; rt, n, r). (2.18) 

\ 2 =l,...,n—1 / 2 =l,...,ri.—1 

Theorem 2.9 immediately implies P(u* G S') —)• 1. Note that, in view of Proposition 
2.4, the same limiting behaviour holds true for the denoising estimate u if the maximum 
of C(i; u, n, r) is unique - which will be the interesting case in this article. 

Remark 2.10. Let Assumptions 2.1, 2.5 and 2.6 hold true. Clearly, conditions (2.16) and 
(2.17) are fulhlled if {£i,fe} are i.i.d. but deviations, in particular within panels, are also 
possible. For example if panels {£»,fc} are independent then 

sup Yax{ej^k£h,k) < oo (2.19) 

j,h=l,...,n, k>l 

^°Under Gaussianity and independence within panels the results coincide up to a normalizing constant. 
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is sufficient. Condition (2.19) may be reduced further to Ya.v{£j^\eh,i) < oo for all 
j,h = 1,... ,n if we additionally assume identical distribution of Notice that it is 

straightforward to check that {e»,fc} from Example 2.8 fulfill conditions (2.16) and (2.17), 
too. 

Perfect estimation and the exact weighting scheme 

Given any estimate u for a change point u, we will speak of perfect estimation}^ if 
consistent estimation, i.e. 

P{u = u)^l, (2.20) 

as d ^ oo, holds true for all possible change points u = 1,... ,n — 1 and all possible 
ratios 0 < r < oo. 

Theorem 2.9 shows that under mild assumptions the stochastic limits of CUSUM 
and denoising estimates are described by the deterministic critical function (2.13). This 
reduces the question of consistency and of perfect estimation to an analytical problem: 
Given (2.18) it is sufficient for consistency if the following Assumption Al holds true. 
Glearly, the same applies to perfect estimation if we require the assumption to hold for all 
possible change points u = 1 ,..., n — 1 and all possible ratios 0 < r < oo. 

Assumption Al. The critical function C{i',u,n,r) has a unique maximum at i = u 
given a change point at u. 

We proceed by studying the existence of weights which ensure perfect estimation for 
the denoising and the CUSUM estimates. The weights are tacitly assumed not to depend 
on u. 

Theorem 2.11. Let Assumption 2.1 he fulfilled and assume that V is strictly positive. 
Only the weights 

w^^^^\i,n) = a/V{i), a > 0, (2.21) 

for i = l,...,n — 1 may fulfill Assumption Al for all possible change points u = 
l,...,n — 1 and all ratios 0 < r < oo. For weights other than (2.21) there is some 
change point u and some ratio r > 0 such that the maximum of the critical function 
C{i\u,n,r) is not at i = u. 

Since the estimates are independent of any scaling a > 0, we may restrict ourselves 
to the case of a = 1 and consider unique. Notice that in the setting of 

Example 2.7 the schemes and coincide and that we already know from 

Bleakley and Vert (2011a, Theorem 3) that these weights yield perfect estimation for our 
estimates in the Gaussian i.i.d. setting. However, as follows from Example 2.8, we have 
generally ^ ^^standard j£ noise is dependent in time. Hence, due to Theorem 

2 .11, weights y;®t3.ndard gg^j^j^ot generally ensure consistency and perfect estimation in such 
cases^^. Notice that the weightings y;®tandard ^^exact jgight differ fundamentally. 

'^^Note that perfect estimation was previously defined in Torgovitski (2015) in terms of the limiting 
critical function C. 

^^CUSUM (and denoising) estimates are not consistent if the (unique) maximum of C{i;u,n,r) is not 
at i = u. 
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The former is strictly convex but the latter is even strictly concave in case of (2.11) for 
a{4>) < 0. 

A consequence of the next theorem, under the assumptions of Theorem 2.9, is that 
covariance-based weights ensure perfect estimation under additional (but again) 

quite mild assumptions on F(i). Therefore, we need to find conditions such that H{i,u) jV(i) 
has a (unique) maximum at i = u for any n = 1 ,..., n — 1. 

Theorem 2.12. Let Assumption 2.1 be fulfilled. Assume that we use the weights 
that V is strictly positive and that V{i) = V{n—i) holds true for all i. Then Assumption 
A1 is fulfilled for all possible change points u = 1,... ,n — 1 and all ratios 0 < r < oo 
if and only if 

V{i)/i > V{u)/u, i, tt = 1,..., n — 1, i<u (2.22) 

holds true. This condition is equivalent to V{i)/i being strictly decreasing. 

Note that the symmetry V{i) = V{n — i) for all i = 1,... ,n — 1 is implied by the 
symmetry of covariances 

Cov((eiq,..., £n,i)'^) = Cov{{£n,i,..., ei,i)'^) (2.23) 

and a sufficient condition for (2.23) is weak stationarity of The next lemma 

provides, based on concavity, a condition for (2.22) which is sometimes easier to verify 
than monotonicity of V{i)/i and that will also be used in Remark 2.14. It is not clear 
how to state a comparable condition under convexity. 

Lemma 2.13. Assume that V{i) is strictly concave^^, strictly positive and that n > 3. 
Then (2.22) already holds true if 

V{l)>V{u)/u (2.24) 

holds true for all u = 2,... ,n — 1. 

Remark 2.14. The function V{i) from Example 2.8 fulfills (2.22) with any moving average 
parameters 0, 0 G M and is always strictly positive for n > 3. 

Now, a combination of Theorems 2.9 and 2.12 together with Remarks 2.10 and 2.14 
yield the following corollary. 

Corollary 2.15. Consider the moving average noise from Example 2.8 and let Assump¬ 
tions 2.1 and 2.6 hold true. Using we have perfect estimation for denoising and 

CUSUM estimates for any moving average parameters (j),6 gR. 

Consistent estimation and the weighted weighting scheme 

We turn to the analysis of y;W“ghted analogously to Pestova and Pesta (2015, 

Theorem 3), our aim is to identify noise to change ratios r for which consistent esti¬ 
mation (2.20) does or does not hold true for the denoising and the CUSUM estimates. 
We restrict the consideration to panels which fulfill all assumptions of Theorem 2.9 with 
V‘^{i) = (i/n)(l — i/n) from Example 2.7 (cf. Remark 2.10). We expect more restrictive 

a discrete function linearly interpolated on the interval [l,n — 1]. 
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conditions on ratios r for changes u closer to the edges of the samples, and vice versa 
less restrictive conditions if u is more centered. This expectation is confirmed by the 
next theorem where the minimum is taken over smaller sets in (2.25) if u is closer to 
n/ 2 . 

Theorem 2.16. Let Assumption 2.1 be fulfilled. Assume V{i) as in (2.9) and assume 
that we use with 7 G [0,1/2). Define R{u,i) as in (5.16) below and set 

7^:= min R{u*,i)>0 (2.25) 

nl2<i<u* 


with 


U 


* 


n — u, u = 1,..., [n/ 2 j, 
u, u = [n/ 2 ],..., n — 1 


and where min0 = 00 . Then Assumption Al holds true for 0 < r < TZ and does not 
hold true for TZ < r. In the latter case the maximum of the critical function C{i;u,n,r) 
is not at i = u. 


The bound in (2.25) can be evaluated numerically but to gain more insight into the 
influence and the interaction of parameters, it is desirable to get explicit representations 
and approximations of this expression. We will provide such approximations where we 
first let d —>■ 00 and then consider n —>■ 00 . Therefore, we have to introduce a boundary 
function 


B{l) 


' 47^+673/^-37^/^-! 

873+873/2—47I/2—27—1 ’ 

2 - 1 / 2 , 


7 e [0,1/2), 
7 = 1/2. 


(2.26) 


This function B{'^) is monotonously decreasing, continuous and B{l) > 2“!/^ holds 
true which can be seen as follows. It holds that dy{B{'y‘^)) = —(1 + 27 )“^ which in turn 
implies that < 0 holds true for any 7 G (0,1/2). Applying rHopital’s rule to 

11 ( 7 ^) we get the continuity at 7 = 1 / 2 . 

Theorem 2.17. Let the assumptions of Theorem 2.16 hold true, set B{'j) as in (2.26), 
set s = u*jn and let IZ = IZ{s,^) be as in {2.2b). 

1. If 1/2 + 1/re < s < B{'y), then 7Z{s,j) equals to the unique solution of 


C{u* — l;re*,re, r) = C{u*]u* ,n,r) 


(2.27) 


and for u* = [reCj with any 1/2 < ( < B{'^) it holds that 

lim IZ{s, 7 ) = C(a - C)(fe - C)(C - c)~^ (2-28) 

n^oo 

with a = 1 , 6 = (7 — 1)/(27 — 1 ), c = 1 / 2 . 

2. If B{'y) + 2/re < s < 1, then the unique solution to (2.27) is larger than 7Z{s,^). 
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In (2.28) the quantity TZ{s,'y), i.e. the range for r that ensures consistency, becomes 
larger for parameters 7 and C close to 1/2, which confirms our intuition. Note that 
Theorem 2.17 is consistent with Bleakley and Vert (2011a, Theorem 2) since, as n —>■ 00 , 
it holds 

77(s,0) = C(1-C)"(C-1/2)-'+o( 1) 
and B{0) = 1. 


The weighting with 7 = 1/4 can be seen as a compromise between y;Standard y^simpie_ 
For this particular 7 we are able to compute A) for any Q G (1/2,1). 

It would be interesting to know if such a formula could be computed for the remaining 
parameters 7 G (0,1/4) U (1/4,1/2) as well. 

Proposition 2.18. Let the assumptions of Theorem 2.17 hold true, set 7 = 1/4 and 
5 ( 7 ) as m (2.26), i.e. 5(1/4) = 3/4. Then it holds for 77 = 77(s, 1/4) that 


lim 77(s, 1/4) 

n^oo 


C(l-C)(3/2-C) 

(C-V2) ’ 

(C-i)^(C(C+i)+(C/2-i)(i-0^/^-i) 

C(C-i)+(C/2)(i-C)i/2 


Cg (1/2, 3/4], 
Cg (3/4,1) 


(2.29) 


This function is continuously differentiable for f G (1/2,1). The second derivative does 
however not exist for ^ = 3/4. 


Spurious estimation for vanishing change points 


We close this subsection by a short remark on spurious estimation for Aqo = 0, i.e. a 
probably common change that vanishes asymptotically. We also include the case = 0 
for k > 1. 

Remark 2.19. Let all assumptions of Theorem 2.9 hold true but with Aqo = 0. Following 
the proof of Theorem 2.9 it is clear that (2.18) also holds true in this situation with 
C{i;u,n,r) := w‘^{i,n)V‘^{i). In case of Example 2.7 and for y;"“ghted yjgijjg 


argmax C{i]u,n,r) 

2 =l,...,n—1 


{Ln/2j,rn/2l}, 7 G [0,1/2), 

{l,...,n-l}, 7 = 1/2, 


i.e. estimation of spurious changes. In case of Example 2.8 and using it also always 

holds that arg maxj^;^^ C{i] n, n, r) = {1,..., re — 1} since C{i] u, re, r) is constant 
in i. 


2.2.4 Estimation of the exact weighting scheme 

In this subsection we discuss estimates of t(;®™*(i,re), or equivalently of V{i), under the 
assumptions that V is strictly positive, that the sequences {£j^k}k&N {£j^k£i,k}k&N 

fulfill the weak law of large numbers for any 1 < j, / < re and that Assumptions 2.1, 2.5 
and 2.6 hold true. Notice that all conditions on the noise {£j,k} are clearly fulfilled for 
our moving average Example 2.8. 
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The partial sums (2.7) can be rewritten as 


n 

i=i 


(2.30) 


with 

1-i/n, j = 

-i/n, j = i + l,...,n 

and it is straightforward to check that 

v\i) = MJ:)/a^ 

where S = Cov(y, 4 ), with functions /i(S) = for i = 1,..., n — 1. 



(2.31) 


(2.32) 


A natural estimate for S is given via S with 
1 ^ 

^ d-l “ ^j,d.)0^k,p - Yk^d), (2.33) 

p=i 

where 1 < A; < n and d > 1. This estimate is consistent^^, as d —)• oo, e.g. if we 

assume additionally that Aii^k = Q always holds true for some Cj G M and all A: > 1, 
i.e. that at all time points the means across all panels are the same. 

Now, V‘^{i) simply may be estimated by v{i) = fi{Y)/a‘^ and the corresponding 
estimate for the weights will be denoted by u)exact^ Here, is the mean over all 

panels at time point j. Recall also, that our change point estimates do not depend on the 
scaling of the weights Thus, without loss of generality we may assume here 

that = 1 and technically we do not have to estimate this parameter in (2.32). 

Remark 2.20. Reasonable estimates for the exact weights have to be strictly positive which 
corresponds to positiveness of fi{Y). This is ensured asymptotically with probability 
tending to 1, as d —)• oo, because the estimate S is consistent and because of our usual 
assumption V{i) > 0. (Clearly, a sufficient condition for finite d would be the positive 
definiteness of the estimate S.) 


Banded estimation based on a training period 

Now, we aim to increase the precision of the estimate for S by averaging and by a 
banded covariance approach. To do so we have to impose some structural assumptions: 
We assume weak stationarity for {£j,k}jeN to hold in time, i.e. within each panel and 
additionally, we assume to have a training period between ni and n 2 , 1 < ni < re 2 < n 
where the above assumptions of = Cj hold true for f = ni,..., n 2 and A: > 1. 

Within this training period we can compute consistent estimates S'-^ Cov(ej^i, i), 

calculations are straightforward and the contribution of common factors vanishes asymptotically 
due to Assumption 2.6. 
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according to (2.33), for j, k = ni,..., n 2 and then may average these estimates as follows 


= 


n 2 — ni — r + 1 


n 2 —r 




Cov(ei4, £1+^,1 


(2.34) 


3=ni 


for r = 0,..., n 2 — ni to gain more stability. (These estimates are consistent as well.) 
Finally, the desired banded estimate S is obtained via 




ir, r = 0,... ,min{/i,n - j}, 

0, r = min{/i, n — j} + 1,..., n 


(2.35) 


for j = 1,..., n and with some banding parameter h G {0,..., n 2 — ni} to be chosen. 
Clearly, the corresponding estimate v{i) = is consistent, as d —)■ oo, for 

panels that are MA(g) in time if q < h < n 2 — n-i holds true. Heuristically, it yields 
also a reasonable approximation for = /i(S)/cr^ in case of stationary and weak 

dependent panels, again in the time-domain, whenever the covariances Cov(ei^i, ei+^^i) 
for r > h are negligible. The corresponding estimate for the weights will be denoted 
by -u)exact-banded^ data-driven selection of the banding parameter h one could use 

the approach of Bickel and Levina (2008, Section 5 and disp. (24)) (cf., e.g., also Wu and 
Pourahmadi (2009)). In their simulations for MA(1) covariance structure, the banding 
parameter is always chosen correctly, i.e. h = 1. 

The assumptions stated above are restrictive and may be questionable in applica¬ 
tions. In the following we (informally) discuss estimates for more complex situations 
when Aii^k = Q for k > 1 does not hold true within a reasonable subsample. Again, 
we need to assume a training period between ni and n 2 , 1 < ni < n 2 < n, such that 
either n 2 < u or ni > u holds true, i.e. that a common change does not occur in this 
subsample. Further, we need to assume stationarity and the weak law of large numbers to 
hold in time for {£j,fc}fceN) he. within each panel. Now, in the first step, we center each 
panel based on means computed within the training period and for each panel separately, 


I.e. 


Ylk ■= Yi,k - 


n2 


I T ^ y Yj,k 
n2 — ni + 1 


for z = 1,..., n and fc = 1,..., d. In the next step (as before) we compute only estimates 
S'for ni < j < k < n 2 but now based on the centered panels {Yjf^}. Proceeding 
as under (2.34) and (2.35) we obtain = fi{Yd)/a‘^ and the corresponding weights 

^centeredThose are heuristically reasonable for large n, d and a relatively large 
training period which is backed up by our simulations. However, to formalize this one 
would need to consider asymptotics n, d —>• oo with |n 2 — ni| —)• oo which is not in the 
scope of this paper. 


3 Simulations 

For our simulations within the single change point scenario we have implemented the 
estimates (2.4)^^ in MATLAB. For demonstration purposes an application with a graphical 

choose u^, as the smallest element in argmaxt(i). 
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Figure 2: Estimation of the change point at u = 70 with different weighting schemes. The parameters 
are = 9, 4> = —3, 9=1 and d = 10000. 


user interface can be obtained from the author or from www.mi.uni-koeln.de/~ltorgovi. 
For the simulations within the multiple change point scenario we work with the MATLAB 
“GFLseg”-package of Bleakley and Vert (2011b). Notice that the denoising approach may 
be interpreted as a group fused LASSO (least absolute shrinkage and selection operator) 
(cf. Section 5) and that the corresponding group fused LARS (least angle regression) 
method yields a fast approximation to the LASSO solution (cf. Bleakley and Vert (2011a)). 
In particular, we use the LASSO and LARS methods that are implemented in gflassoK.m 
or in gflars.m respectively. 

We proceed by considering panels {V.fc} with the following parameters: The noise 
{si^k} is moving average from Example 2.8 based on independent Gaussian innovations 
{'ni,k}- The common factors {Ci} are chosen to be independently uniformly distributed, 
centered and with the same variance as the noise Unless stated otherwise, the 

factor loadings are set to 'jk = and the length of panels is n = 100. For simplicity, 

we choose common changes with /ri ^ = 0 and ^ 2 ,k = 1 for all k, i.e. = 1 and 
therefore Aqo = 1- For ^^exact-banded ^exact-banded training period is chosen 

as ni = 1 and n 2 = 20 with a bandwidth h = 2. (The influence of the misspecification 
of h is rather mild in our settings.) 

All Monte Garlo simulations will be based on 100 repetitions. Notice that 
replaces w, whenever the corresponding estimate V = 1/w has at least one non-positive 
entry (cf. Remark 2.20). 

3.1 Segmentation under dependence 

Figure 2 shows one realization of t(i)’s from (2.4) for different weighting schemes^® and the 
corresponding critical curves Cn{i', u, r) from (2.13), which are in probability the limits of 
the t(f)’s as d —)• oo. The vertical lines indicate the locations of maxima for the respective 
weightings, i.e. the positions of estimates u* and u. We see that provides a 

correct estimation of u = 70 whereas yj®ta-ndard ^^Qgg Pqj. ^^simpie estimate a 

more centered change point and gg^imates a completely wrong location at the 

^®For the sake of comparison we shift and rescale the curves by the transformation: f{i) i— f(i) = 
[/(*) - mini<j<„ /(j)]/[maxi<j<„ f{j) - mini<j<„ f(j)]. 
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right border. 

Figures 3 and 4 demonstrate the performance of the estimates with respect to different 
weighting schemes. They compare the accuracy P(u* = u), the means and the standard 
deviation of the estimate ft*. We ran simulations with 9 = 1 and = 9 and considered 
a range of parameters: (/> = —2, —1, —0.5,0,1 and d = 50,150, 250, 200,..., 1950, 2000. 

The figures show that the change point estimate based on the (estimated) exact weight¬ 
ing scheme outperforms y;Standard^ former estimates all changes correctly for 

arbitrary 4> if we consider a sufficiently large number of panels d. The exact scheme is 
less biased and also has overall less variation for negative (j). Furthermore, we see for cho¬ 
sen parameters that the distortion due to estimation with is rather mild^^. 

However, it might be strong for other parameters e.g. if the training period is too small. 
Notice that change point estimation with may (surprisingly) perform even better 

than with (cf. Figure 4). 

3.2 Segmentation under varying change point locations 

So far, we assumed a fixed change point at some time point u across all panels. It 
seems more realistic to allow for some minor fluctuations (around time point u) of change 
points in different panels. Therefore, Bleakley and Vert (2011a) studied the behaviour 
of their procedure under randomized change points theoretically and empirically. They 
considered changes across panels k = 1,... ,d that are located at random change points 
u + Uk G {1,... ,n — 1}, where {Uk} are some i.i.d. random variables describing the 
fluctuations^®. In (Bleakley and Vert, 2011a, cf. Theorem 4 and Figure 3) they showed 
under appropriate assumptions that the standard weighting works also well in this setting 
in the sense that the probability P(u* G u -|- 5) tends to 1 as d —>■ oo, where S is 
the support of Pui. We do not develop the theoretical analogue but show in Figure 5 
empirically that, as should be expected, the exact weighting tends to be beneficial under 
dependence. For this simulation we stick to the panels and simulation parameters of 
Subsection 3.1 with = 9 and 9 = 1. As in Bleakley and Vert (2011a), we assume 
P{Uk = ±2) = 0.5 and we use the term accuracy now for P{u^, G u + S). 

3.3 Segmentation in the multiple change point scenario 

In this subsection we assume multiple change points and compare the standard weighting 
scheme with the exact one using the denoising approach. First, we discuss epidemic 
changes, i.e. we have two change points ui and U 2 where the means are temporarily 
shifted after ui but return to their former states after U 2 . 

We performed various simulations for different change point locations ui and U 2 , 
moving average parameters (p and 9 and for different variances where we restricted 
our considerations to a simplified setting with the same magnitude of changes in all panels. 
We observed that the exact weighting tends to be beneficial in the sense that the overall 
picture improves. This is demonstrated in Figure 6: With the exact weighting the accuracy 
for p < —1 increases considerably whereas for p > —1 it only decreases slightly. The 
curves are obtained using the fast LARS method but the group fused LASSO yields similar 
results. 

'^^The results for close to wJJntered"'^'*'^ in this particular setting and are therefore 

omitted. 

are assumed to be also independent of Cu i, fe £ N}. 
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Figure 3: Accuracy, mean and standard deviation for u = 55. is denoted by “exact-est” and 

w'centered"'^'^'^ is denoted by “exact-est-banded-centered”. 
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Standard deviation: standard Mean: standard Accuracy: exact-est Accuracy: standard 










Figure 4: Accuracy, mean and standard deviation for u = 90 
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Figure 5: Random change locations u + Uk that fluctuate around u = 70. The means are rtij^k = 0 for 
j < u + Uk and rrij^k = 1 for j > u + Uk for all k. 


In general multiple change point settings the situation is less clear than in the single 
change point or epidemic settings and the behaviour is rather erratic: The results strongly 
depend on the location and on the magnitude of the changes as well as on the moving 
average parameters - it is possible to find settings where the exact weighting scheme 
outperforms the standard one but also vice versa. 

3.4 Post processing estimated exact weights using regression 

Here, our interest is again in the single change point scenario using u* with the simple 
weighting estimate w{i) = n) which is based on (2.33). We would like to mention 

a possible consistent modification which tends to be beneficial in situations described below 
and that may serve as a motivation for further research. In the following we assume that 
y^exact -g convex since the strictly concave case can be treated analogously. 

Based on the results and the corresponding proofs of Section 2 it seems reasonable to 
expect that, if is strictly convex and smooth, which is the case for panels based 

on Example 2.7 or on Example 2.8 (with e.g. (p > 0), then modifications of w that 
are strictly convex, and therefore also less oscillating, should increase the precision of the 
resulting change point estimate. Obviously, the estimate w is usually not strictly convex 
due to the fluctuations around the strictly convex (discrete) function w. To obtain a 
smoother convex estimate y)ex'ict-reg _ may post-process the weights w using the 

well known least squares convex regression. The basic principle is that, given a regression 
model 

w{i) = w{i) + £i 

with a strictly convex function w{i) and some centered noise sequence {£*} for i = 
1,..., n — 1, we solve 

12 

Minimize > \w{i) — w{i) , 

'w{i),gi “ L J 


under the convexity restrictions 
w{j) > w{i) +gi{j - i) 
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(3.1) 








where (3.1) holds true for i, j = 1,... ,n — 1 (cf., e.g., Boyd and Vandenberghe (2004) 
and Hannah and Dunson (2013)). Clearly, in our situation, these estimates w{i) remain 
consistent for d —>■ oo if the underlying original estimates w(i) were consistent and 
therefore strictly convex with probability tending to 1, as d —)• oo. 

In our simulations^® with moving average panels we observe that the weighting schemes 
^exact-reg ^^exact yigiji similar estimation results for ii*. They outperform 

^exact £qj, gnialler variances d®, smaller panel numbers d and for parameters cj) that 
are closer to —1 (cf. Figure 7). This effect is stronger for change points u closer to 
n/2 and weaker for u closer to n. Notice that the estimate outperforms the 

“true” for larger variances and larger panel numbers d. This goes in line with the 

observations of Subsection 3.1. 




Figure 6: Epidemic change with change points at {55,80} and a jump of size +1 in all panels. We 
consider — 1 with 9 = 0 and we do not take common factors into account, i.e. we set -jk = 0. 
Accuracy denotes now the probability that all change points are estimated correctly. Notice that the curves 
for (j) = —2, —3.5 are both zero in the left figure. 

'^^The computation of the regression weights is carried out using the MATLAB software “CVX: A system 
for disciplined convex programming” (see http://cvxr.com/cvx/). 
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(a) d = 10, (^ = -0.3 (b) d = 200, 4> = -0.3 




(c) d = 10, (^ = 0.5 (d) d = 200, (j) = 0.5 

Figure 7: We consider panels of length n = 50 with a change point at u = 37 and with 0 = 0. We do 
not take common factors into account, i.e. we set 7 fc = 0. 


4 Conclusion 

In this article we showed the connection of the total variation denoising approach of 
Bleakley and Vert (2011a) to the classical weighted CUSUM estimates. We generalized 
the consistency results of Bleakley and Vert (2011a) to panels of time series in the fixed 
n and d —)• oo setting under mild assumptions and studied consistency properties with 
respect to a well-known class of weighting schemes. Doing so, we also defined the criterion 
of perfect estimation, which is fulfilled in the independent setting if one uses the standard 
weighting, and showed that generally only a suitable covariance-dependent modification 
of these weights ensures this criterion. Thus, corresponding estimates outperform the 
standard weighting in various situations. We discussed appropriate estimation of these 
new weights and confirmed our results in a detailed simulation study. Moreover, we 
discussed the implications and possible advantages for the multiple change points and the 
random change point scenarios as well. 
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5 Proofs 


We start by proving Proposition 2.4, which requires some preliminary considerations and 
some key facts from Bleakley and Vert (2011a). For theoretical investigations and also for 
practical purposes the authors pick up the idea of Harchaoui and Levy-Leduc (2008) and 
reformulate the minimization Problem ( 2 . 1 ) as a group fused LASSO. 


Using the weights u;(z, n), they introduce a fixed design matrix D with 




and set 


A,. = 


wU,n), i>j, 
0 , else 


(5.1) 


w[i,n) 

which can be compactly rewritten as U = lUi,, + Did where 1 = [1,..., 1]^. Thereby, 
the original problem ( 2 . 1 ) transforms to 

n—1 

IIA,.||2 (5.2) 


1 „,-, 


Minimize -\\Y — D(d\\'jp + 

flgK(n-l)xd 2 ^ 

1=1 

with the same A > 0, where Y and D are the column-wise centered matrices Y and 
D, respectively. Let /3(A) denote the solution of (5.2). The solution U of (2.1) can be 
recovered via U = + Dfi with 7 = {Y — D^)/n. The indices of non-zero rows 

of matrix /3 correspond to the change point set £ in (2.3) via (5.1) and it holds that 
£{X) = {u I /3„,.(A) / 0}. 


The crucial observation for further theoretical analysis is that /3 minimizes (5.2), for 
any fixed A, if it fulfills the necessary and sufficient Karush-Kuhn-Tucker (KKT) condi¬ 
tions: 

Gi =XB, V/3,,.^0, 

||Gi||<A VA,. = 0, 

for all i = 1,... ,n — 1 with vectors Gi = {Y — Djd) and Bi = A,»/||A,»II- 


The next proposition formalizes the selection of the regularization parameter A which 
was already informally described in the Subsection 2.2. 

Proposition 5.1. Consider the random matrix c = D^Y, set U = ||ci,,|| for i = 

1,..., n — 1. Assume ti^ < ti^ < ... < ti^_.^ with A A V for k ^ r and set M = in-i, 

m = in- 2 - Under Assumption 2.1 it holds that: 

1- If X> tM then /3 = 0 solves the KKT system (5.3). 

2. If tm < X < tM then a random Amin exist such that for any Amin < X < tM the 

/3 with rows 


A,* — ox 


Cm,* 

0 , 


i = M, 

i^M, 


and a = {tM - X)/{D’^j^^D,^MtM) 


solves the KKT system (5.3). 


(5.4) 
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In the latter case we obtain S{X) = {M}, i.e. u = M. 

Proof of Proposition 5.1. For jl = 0 the conditions of (5.3) simplify to X >tM and the 
first statement follows immediately. We turn to the second statement where tm < X < tM 
and in which case 


Therefore, the first equality in (5.3) translates to 
A 


cm,» = 




P ]/3m,9 


X 


(tM - A)/(T>3^^T>,,m) 


+ I f^M, 


(5.5) 


which is fulfilled by the definition of /3 m,H ere, the latter equality in (5.5) holds true 
since the former equality in (5.5) implies 


tM 


|cm,« 




For A t ^M we have /3 —)• 0 and therefore 
assertion. 


Gi\\ ^ ti < tM for i ^ M which yields the 

□ 


We are now in the position to provide the short proof for Proposition 2.4. 
Proof of Proposition 2 . 4 . Straightforward calculations yield 


(5'^y). , = -w{i,n) Y,{Yj,k - Kk) 
i=i 

for all i = 1,... ,n — 1 and k = 1,... ,d. Hence, we obtain = ||cj^,|p = t{i). Since 
t{i) has a unique maximum, we know that an appropriate A with tm < X < tM may 
be selected which, together with Proposition 5.1, completes the proof. □ 

We continue with the proofs for Subsection 2.2.3. 

Proof of Example 2.8. According to (2.30) and due to independence we have 


Var(S'i,A:(e)) = Var {rij^k + <t>Vj-i,k) 


vi=i 


+ e^Var IE ai,j {Vj,k-i + 4>rij-i,k-i) ) • 

v3=i 
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Straightforward calculations yield 


:?^Var 
^ \j=^ 

= (1 — i/n)'^ [z(l + + 2{i — !)(/)] 

+ {i/nf [{n - i){l + </>^) + 2 ((n - i) - l)4>] 
— 2 (1 — i/n) (i/n) cj) 


(1 + + 2 (/)) 

(1 — i/n^ i + {i/nf (n — i) 

- 2<f 

(1 — i/n 

)^ + (i/n)^ + (1 - i/n) {i/n) 


= «(</') (*(1 - i/n)) - 2 (/>, 
where a{(f)) is set in (2.12) and this implies 
^‘^{i)C = «( 0 ) {{i/n) (1 — i/n)) — 24 >/n 

with e = cjV(cr 2 (l + 02 )). □ 

Proof of Theorem 2.9. Using the notation of (2.7) we consider 

i 

S^,k{Y) = n-^/‘^Y.^Yj^k-Kk) 

j=i 

i i 

= n~^i‘^J2{^j,k - £n,k) + lkn~^^‘^J2{^j ~ “ n^^‘^H{i,u)Ak 

i=i i=i 

= Si^k{^) + 'ykSi{C) + Ck, 

with non-random Ck = —'n}^‘^H{i, u)Ak and where Si{C) = n l]}=i(0 - Cn)- It 
holds that 


S^,k{Y)" = \Si,k{^) + lkSi{C)\^ + 2Ck{Si,k{e) + ikSi{C)) + Cl 
and together with the independence of the centered Si^k{^) and Si{C/) we get 
E\S,^k{Y)\^ = E\S,,k{e)\^ + ilE\S,{0\^ + C|. 

Due to part 2 of Assumption 2.5 and due to Assumption 2.6 it holds that Yl'k=i E\Si^k{£)\‘^ / d 
V‘^{i)a‘^ and that Ik/d = o(l)) as d —)• oo. Therefore, we get that, as d —)• oo, 

=V\i)<J^ + nH\i,u)Al + o{l) (5.6) 

k=l ' 

with = Ylk=i ^‘k/d and for each i = 1,... ,n — 1. Now, assume that we already 
know that 

k=i ' 
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for any i. Then, via Chebyshev’s ineqnality, we obtain from (5.6) and (5.7) that, as 
d —)• oo. 


1 tj'i-) 

77,2 d 


n) 1 
nA^ d 




k=l 



C{i‘ u, n, r) 


for each i = l,...,n — 1. Due to the continuous mapping theorem we obtain 
1 maxi^5 t{i) p 




d 


maxC'(i; u, n, r), 
i^S 


where S is set according to (2.18) and which then completes the proof. (We neglect the 
trivial case of 5^ = 0.) 

It remains to show that (5.7) holds true for any i indeed. It is sufficient to show that 
all terms 


d 

Var(^ 5^,(5)), 
k=l 

d d 

Var(5,(C) j;7fc5,,fc(e)) =^(S’2(C)) ^ 7,7,^(5,,fc(e)5,,,(e)), 

k=l k^r=l 

Var(52(C) 5^71) = (E^fc)'var(52(C)), 

k=l k=l 

d d 

Var(^Cfc 57 fc(£)) = CkCrE(s,^k{e)Si,r{e)), 

k=l fc,r=l 

d / d 

Y^ikCk 

k=i 


Var(5,(C) j;7fcClfc) 

k=l 



(5.8) 

(5.9) 

(5.10) 

(5.11) 

(5.12) 


are of order o{d‘^) because the mixed covariance terms can be neglected due to the 
Cauchy-Schwarz inequality. The fourth moments of any Q are finite, hence E(Sf{C)) 
and Var(5j^(C)) are finite too. The terms (5.8), (5.9) and (5.11) are of order o(d^) which 
follows from Assumptions (2.16) and (2.17) if we take 


d n d 

Var(^ ^ ^ ^ (kijCbi^qCli^lCLi^rn ^ ^ CoY(^£j^k^g,kj^l,r^m,r) 

k=l j,l,q,m=l k,r=l 

and 

n 

(£)<Sj^r (s)^ — 'y ^ 

with Qij defined in (2.31), into account. The term in (5.10) is of order o{d‘^) in view of 
(2.8) and similarly (5.12) is of order o{d?) via the Cauchy-Schwarz inequality and again 
due to (2.8). □ 
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Proof of Theorem 2.11. The critical function C{-;u,n,r) is positive. It has generally a 
maximum at i = u for all ratios r > 0 if and only if 


^ ^ C{u-,u,n,r) 

~ C{i-,u,n,r) 

holds true for all r>0, i = l,...,re — 1. This can only be fulfilled if 

C{u-,u,n,r) \ w{u,n)V(u)'\‘^ 

1 ^ lim -= - 

r^oo C{i;u,n,r) \_w{i^n)V{i) 


(5.13) 


(5.14) 


holds true for every i = 1,... ,n — 1 and the positive weights w{i,n) = a/V{i) from 
(2.21) fulhll these constraints. 

If (2.21) does not hold, then either a < 0 and thus w{i,n) is negative or w{i,n)V{i) 
is not a constant function in i and thus 

^ ^ r rc(g,n)y(g) 1^ 

\_w{p, n)V(p) 

now holds true for some q ^ p. The former contradicts the positivity constraint and the 
latter contradicts condition (5.14) for u = p, i = q and therefore also (5.13) for some 
r > 0. □ 


Proof of Theorem 2.12. It holds w‘^{i,n)V‘^{i) = 1. Hence, for any r > 0, a unique 
maximum of C{i-,u,n,r) is equivalent to 


0 < C{u-, u, n, r) — C{i; u, n, r) 

{ w‘^{u, n)(u/n)^(l — n/n)^ — n)(z/n)^(l — u/iTf, i < u, 

w‘^{u, n)(u/n)^(l — u/n)^ — n)(tt/n)^(l — i/n)^, i > u 

for all i ^ u. Due to the symmetry of V(i) this is equivalent to 

V(n — i) V{i) w{u,n) ji/u, i < u, 

V{n-u) V{u) w{i,n) - i)/{n - u), i>u 

and the assertion follows. □ 


Proof of Lemma 2.13. Set fu{i) = V{i)/V{u), gu{i) = i/u and observe that 
fuiu) = V{u)/V{u) = 1 = u/u = Quiu). 

Now, assume that /n(l) = V{1)/V{u) > 1/u = gu{l)- Since fu{i) is strictly concave in 
i and gu{i) is linear in i the assertion follows immediately. □ 

Proof of Remark 2.14. We have to distinguish the two cases a{4>) > 0 and a{(p) < 0. In 
the first case V{i) is strictly concave and we may use Lemma 2.13. Simple calculations 
show 


c(l4^(l) — V‘^{u)/v?) = (un) ^(m — l)((u + ufP' — 24 >)n + 2(j)u) 
= {un)~‘^{u — l)((/>^nn — 2(j){n — u) + nu) 
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for some c > 0 and it is easy to check that — V‘^{u)lv? > 0 holds true (for (j)>^ 

we check via the first equality and for (j) < 0 via the second equality). The latter case, 
C(i4>) ^ 0) occurs only for negative (p with 

— 1 — n~^ — (2n“^ + < </> < —1 — n~^ + {2n~^ + (5.15) 

In this case it is sufficient to check that 

h{t) := ^a(</>) {{t/n) (1 — t/n)) — 2pjn^ jt} 

is strictly decreasing on t G [l,n —1]. Hence, h{i)j^ = with ^ = (T^/(cr^(l + 0^)) 

is strictly decreasing too. It holds that 

nt^dth{t) = —a{4>)t + Ap 

and a sufficient condition for h{t) to be strictly decreasing is that —a{p)t + Ap<0 holds 
true. This condition is fulfilled for any t € [l,n — 1] whenever it is fulfilled for t = n. 
The latter is equivalent to n{p'^ + 2</> + 1) > 2(/> which always holds true since p < 0. 
Finally, V{i) >0 follows immediately from (2.30). □ 

Proof of Theorem 2.16. We assume that u > n/2, i.e. u* = u and set w = y;weighted_ 
(The case tt < n/2 follows by symmetry.) We consider the case i < u first and define 

C{x, r) = F{x)r + G{x) 


with 


F{x) = w^{x,n)V‘^{x), 

V‘^{x) = (x/n)(l — x/n), 

G(x) = ViP‘{x, n)((x/n)(l — n/n))^ 


on [0,n), i.e. C(x,r) = C{x]u,n,r) for x G {l,...,n}. For convenience we suppress 
the dependence on u and n. It is easy to check that G{x) is strictly increasing with 
G(0) = 0, that F{x) is strictly concave on [0,n] and symmetrical with respect to 
X = n/2 and that F’(O) = 0 = F{n) holds true. Further, G{x,r), as a function of x, 
has a unique maximum at u if and only if G{x,r) < C{u,r) for any x ^ u. Now, 
C{x,r) < (>)C(y,r), x < y and n/2 < y is equivalent to 


<{>)R{y,x), n-y <x <y, 
> {<)R{y, x), 0 < X < n — y 


and simply G{x) < {>)G{y) if x = n — y, where 


G{y) - G{x) f>0, n-y <x <y, 
F{y) - F{x) [<0, 0 < X < n - y. 


(5.16) 


The latter holds true because G{x) is strictly increasing on [0, y], i.e. G{y) — G{x) > 0 
is strictly decreasing in x, and because of the properties of F described above. In 
the following we assume that y > n/2 and x < y. Since G{x) < G{y) we know 
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that C{n — y,r) < C{y,r) for any r and since F is symmetric we also conclude that 
R{y,x) > R{y,n — x) for n — y<x< njl. Altogether, this implies that C{i,r) < C{u,r) 
for all (discrete) 0 < i < u and r > 0 if and only if 

0 < r < TZ = min R{u, i). 

nl2<i<u 

The function C{x-,u,n,r) is obviously strictly decreasing for nj2<u<x<n and the 
claim follows. □ 

Proof of Theorem 2.17. We restrict our considerations to n > n/2. The case u < n/2 
follows by symmetry. In Bleakley and Vert (2011a, Theorem 2), i.e. for 7 = 0, it is used 
that C{i]u,n,r) has a global maximum at u only if C{u;u,n,r) > C{u — l;n, n,r). 
This does not hold true in the case of 7 G (0,1/2) and a global maximum can differ from 
u even though C{u] u, n, r) > C{u — 1; u, n, r) holds true. However, we will see that this 
situation cannot occur if s G (1/2 + 1/n,^(7)]. 


We use the notation from the proof of Theorem 2.16. As mentioned there, it is suf¬ 
ficient to consider the case i < u. In this case we know from the proof of Theorem 
2.16 that a possible local maximum of C{x,r) for x G [0,m] can only occur at some 
a^max £ [n/2,u\. Moreover, using basic analysis we know that 

, C{x,r) Fix) 

hm — -- = — 

r^ooC{y,r) F{y) 

for any 0 < x, y < n. Due to strict concavity of F we know that for any 0 < 5 < 1 it 
holds that F{n/2) > F{n/2 ± d). That is, for sufficiently large r, a local maximum of 
C{x,r) occurs within [n/2 — d, n/2 + d] n [n/2, n] = [n/2, n/2-|-d]. 


Now, we compute the rescaled first derivative of C{x, r) on [0, n) which will be denoted 
by 


P(x, r) := 


((x/n)(l — x/n))‘^'^~^^ 


dxC{x,r). 


P{x, r) can be evaluated to a second order polynomial in x and for x G (0, n) we 
know that dxC{x,r) = 0 if and only if P{x,r) = 0. Furthermore, since C(0,r) = 0 
and C{x,r) —>■ 00 as x f n for any r > 0 we may have, in case of dxC(x,r) = 0, 
either only a saddle point or a maximum and a minimum must occur simultaneously at 
some 0 < Xmax < a^min < n. We also know from previous considerations that Xmax > 1^/2. 


The discriminant D(r) of P{x,r) is a second order polynomial in r with roots 

-(27 + 2) ±47^/2 


n,2 = 


27-1 


-(1 — ujn)^. 


D(r), as a second order polynomial, must be positive for r > r 2 , where r 2 > ri. Recall 
that, C(x,r) has a local maximum within [n/ 2 , n /2 + 5) for any 0 < <5 < 1 and all 
sufficiently large r. Otherwise, D{r) would be negative for r > r 2 and we would have 
no extrema of C{x,r) in case of large r. 
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The solution of P{x,r2) = 0 is given by x* = nB{'y) G (n/2,n) which is unique and there¬ 
fore must be a saddle point of C{x,r2). For n < r < r2 a real solution to P{x,r) = 0 
does not exist and therefore C{x, r) does not have any extrema on (0, n). For 0 < r' < ri 
real solutions do exist but the corresponding roots of P{x, r') cannot correspond to a 
maximum or a minimum of C{x, r') on (0, n) as discussed in the following. Assume that 
it is not a saddle point, then we would have a maximum and a minimum because they must 
occur simultaneously at some n/2 < Xmax < Xmm < n, i.e. Cfx ma x. r') > C{x^m^r'). 
This implies r' > i?(xmax, a^min) and therefore C'(xmax,a’) > C{x m\n -.r) for any r > r', 
which contradicts the fact of no extrema for ri < r < r2. We did not exclude the pos¬ 
sibility of a saddle point at Xs G (0, n) because for our conclusions it won’t cause any 
problems as long as the function remains strictly increasing on (0,n)/{xs}. 

Assume that dxC{xo,ro) = 0 for some n/2 < xq < n, vq > 0. For any e > 0 we 
have 


dxC{x, ro + e) = dxC{x, ro) + edxF{x) 
with dxF{x) < 0 for x G (n/2,n) and we know that 


dxC{x,ro) 



X = Xo, 

0 < X < n, 


X xq and xq is a saddle point. 


The first equality ensures that dxC{xo,rQ+s) <0 for any e>0 and that C{x,ro+£) has 
local extrema n/2 < Xmax < xq < Xmin- The second inequality ensures for saddle points 
Xo that, for any 0 < <5 < 1 we can find an e > 0 such that X ma x. Xmin £ 2:0 + <^)- 

Moreover, we know that Xmaxin/2 and that Xmin t ^ a-s r f 00. 


The properties discussed above ensure that given n/2-|-l < u < it suffices 

to compare C{u; u, n, r) and C{u — 1; u, n, r) to decide whether a maximum is at u or 
not. The remaining assertions follow now by simple analysis. □ 

Proof of Proposition 2.18. As before, we restrict our considerations to u> nj^ and the 
case u < nl2 follows by symmetry. Using the notation of Theorem 2.16 we consider the 
quantities R{u, x), with a continuous argument x G {n — u,u). 

The case C ^ (1/2; 3/4] is already shown in (2.28) in Theorem 2.17 and we continue 
with C G (3/4,1). If nB{'-f) < u < n, the properties in the proof of the previous 
theorem ensure that for sufficiently large n, for 7 G (0,1/2), the differentiable function 
R{u,x) must have a local minimum at Xmin & ['n/2,nB{'y)] such that C{u,r) > C{x,r) 
holds true, for any x < u in case of r < i?(u,Xmin) and C{u,r) < C{x,r) holds 
true for some x < u in case of r > R(u, Xmin)- Some tedious but straightforward 
calculations for 7 = 1/4 allow us to solve dxR{u,x) = 0 explicitly and to identify 
the minimum at Xmin = n/2 -|- (n^ — nn)^/^/2. Since, Xmin is not necessarily in N 
we consider R{sn, Xmin + S) for any 6 G [—1,1], where the limits in n equal (2.29), and 
(using the mean value theorem) observe that this convergence is uniform in S. Therefore, 
R{sn, [xminj) and R{sn, [xniinl) have the same limits and the assertion follows since 
77(s, 1/4) corresponds to the former or the latter for each n. Finally, the smoothness 
properties follow on applying rHopital’s rule. □ 


30 


Acknowledgment 


The author wishes to thank Prof. J. G. Steinebach for helpful comments and Christoph 
Heuser for suggestions to the proof of Theorem 2.17. The author is also thankful for 
the valuable comments and suggestions of the anonymous referees that helped to improve 
the quality of this paper. This research was partially supported by the Friedrich Ebert 
Foundation, Germany. 

References 

Aue A., Horvath L. (2013) Structural breaks in time series. J. Time Ser. Anal., 34(1):1-16 

Bai J. (2010) Common breaks in means and variances for panel data. J. Econom., 157:78-92 

Bickel P. J., Levina E. (2008) Regularized estimation of large covariance matrices. Ann. Stat., 36(1):199- 
227 

Bleakley K., Vert J. P. (2010) Fast detection of multiple change-points shared by many signals using group 
LARS. Advances in Neural Inform. Process. Syst., 23:2343-2352 

Bleakley K., Vert J. P. (2011a) The group fused LASSO for multiple change-point detection. 
arXiv:1106.4199vl, 1-25 

Bleakley K., Vert J. P. (2011b) The group fused LASSO for multiple change-point detection. Technical 
report HAL-00602121, 1-25 

Boyd S., Vandenberghe L., (2004) Convex Optimization. Cambridge University Press, New York 

Csorgo M., Horvath L. (1997) Limit Theorems in Change-Point Analysis. Wiley, Chichester 

Frick K., Munk A., Sieling H. (2014) Multiscale change point inference. J. R. Stat. Soc. Ser. B Stat. 
MethodoL, 76(3):495-580 

Hadri K., Larsson R., Rao Y. (2012) Testing for stationarity with break in panels where the time dimension 
is finite. Bull. Econ. Res., Issue Supplement si., 64:sl23-sl48 

Hannah L. A., Dunson D. B. (2013). Multivariate convex regression with adaptive partitioning. J. Mach. 
Learn. Res., 14:3261-3294 

Harchaoui Z., Levy-Leduc C. (2008) Catching change-points with LASSO. Advances in Neural Inform. 
Process. Syst., 20:617-624 

Horvath L., Huskova M. (2012) Change-point detection in panel data. J. Time Ser. Anal., 33:631-648 

Horvath L., Rice G. (2014) Extensions of some classical methods in change point analysis. TEST, 23(2):219- 
255 

Jandhyala V., Fotopoulos S., MacNeill L, Liu P. (2013) Inference for single and multiple change-points in 
time series. J. Time Ser. Anal., doi:10.1111/jtsal2035, 1-24 

Joseph L., Wolfson D. B. (1992) Estimation in multi-path change-point problems. Commun. Stat.-Theory 
Meth., 21:897-913 

Joseph L., Wolfson D. B. (1993) Maximum likelihood estimation in the multi-path change-point problem. 
Ann. Inst. Stat. Math., 45:511-530 

Kim D. (2014) Common breaks in time trends for large panel data with a factor structure. Econom. J., 
17:301-337 

Pestova B., and Pesta M. (2015). Testing structural changes in panel data with small fixed panel size and 
bootstrap. Metrika, doi:10.1007/s00184-014-0522-8, 1-25 

Torgovitski L. (2015) Panel data segmentation under finite time horizon. Preprint on arXiv:1501.00177v2, 
1-31 

Wu W.B., Pourahmadi M. (2009) Banding sample autocovariance matrices of stationary processes. Statist. 
Sinica, 19:1755-1768 


31 


